Skip to main content

Full text of "USPTO Patents Application 10013034"

See other formats


PCT 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) fi>( 



(51) International Patent Classification 6 : 
G06F 17/20, 17/27, G10L 5/00 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 98/09228 

5 March 1998 (05.03.98) 



(21) International AppKcation Number: PCT/US97/ 15388 

(22) International Filing Date: 28 August 1997 (28.08.97) 



(30) Priority Data: 

60/025.145 
Not furnished 



29 August 1996 (29.08.96) US 
27 August 1997 (27.08.97) US 



(71) Applicant: BCL COMPUTERS, INC. [US/US1; 650 Saratoga 

Avenue, San Jose, CA 95129 (US). 

(72) Inventor: A LAM, Hassan; 1090 Leslie Drive, San Jose, CA 

95089 (US). 

(74) Agent: SCHREIBER, Donald, E.; P.O. Box 64150, Sunnyvale, 
CA 94088-4150 (US). 



(81) Designated States: AU, CA, CN. JP, NZ, RU, European patent 
(AT, BE. CH. DE, DK, ES. FI. FR, GB. GR. IE. IT. LU, 
MC, NL, PT, SE). 



Published 

With international search report. 



(54) Title: NATURAL-LANGUAGE SPEECH CONTROL 




20 



(57) Abstract 



A natural-language speech control method (20) produces a command (34) for controlling the operation of a digital computer (36) from 
words spoken in a natural-language. An audio signal that represents the spoken words is processed to generate textual digital-computer-data 
(24). The textual digital-computer-data (24) is then processed by a natural-language-syntactic-parser (26) to produce a parsed sentence in 
a logical form of the command (28). The parsed sentence is then processed by a semantic compiler (32) to generate the command (34) 
that controls the operation of the digital computer (36). The command (34) is expressed in a natural-language sentence that has an implied 
second person singular pronoun subject and the verb is active voice present tense. The preferred method uses a principles- and -parameters 
(P-and-P) Government-aiid-Binding-based (GB-based) namral-language-syntactic-parser (26) for resolving ambiguous syntactic structures. 



FOR THE PURPOSES OP INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCX 



AL 


Albania 


ES 


Spain 


LS 


AM 


Armenia 


pi 


Finland 


LT 


AT 


Austria 


FR 


France 


LU 


AU 


Australia 


GA 


Gabon 


LV 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


BA 


Bosnia and Herzegovina 


GK 


Georgia 


MD 


BB 


Barbados 


GH 


Ghana 


MG 


BE 


Belgium 


GN 


Guinea 


MK 


BF 


Burkina Paso 


GR 


Greece 




BG 


Bulgaria 


HU 


Hungary 


ML 


BJ 


Benin 


IB 


Ireland 


MN 


BR 


Brazil 


IL 


Israel 


MR 


BY 


Bclaras 


IS 


Iceland 


MW 


CA 


Canada 


IT 


Italy 


MX 


CF 


Central African Republic 


JP 


Japan 


NE 


C€ 


Congo 


KE 


Kenya 


NL 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


a 


Cdce d'Jvoirc 


KP 


Democratic People's 


NZ 


CM 


Cameroon 




Republic of Korea 


PL 


CN 


China 


KR 


Republic of Korea 


FT 


cu 


Cuba 


KZ 


KazaJotan 


RO 


cz 


Czech Republic 


LC 


Saint Lucia 


RU 


DE 


Germany 


U 




SO 


DK 


Denmark 


LK 


Sri Lanka 


SB 


EE 


Estonia 


LR 


Liberia 


SG 



Lesotho 


SI 


Slovenia 


Lithuania 


SK 


Slovakia 


Luxembourg 


SN 


Senegal 


Latvia 


SZ 


Swaziland 


Monaco 


TD 


Chad 


Republic of Moldova 


TG 


Togo 


Madagascar 


TJ 


TajikistaD 


The former Yugoslav 


TM 


Turkmenistan 


Republic of Macedonia 


TR 


Turkey 


Mali 


TT 


Trinidad and Tobago 


Mongolia 


UA 


Ukraine 


Maura tnta 


UG 


Uganda 


Malawi 


US 


United States of America 


MeaJco 


uz 


Uzbekistan 


Niger 


VN 


Viet Nam 


Netherlands 


YfJ 


Jugoslavia 


Norway 


ZW 


Zimbabwe 



New Zealand 



Sudan 

Sweden 

Singapore 



WO 98/09228 



PCT/US97/15388 



- 1 - 

NATURAL -LANGUAGE SPEECH CONTROL 

Technical Field 

The present invention relates generally to the technical 
5 field of digital computer speech recognition and, more particu- 
larly, to recognizing and executing commands spoken in natural- 
language. 

Background Art 

10 Currently, humans communicate with a computer primarily 

tactilely via keyboard or pointing device with commands that must 
strictly conform to computer program syntax. However, speech is 
the most natural method for humans to express commands. To 
improve speed, usability and user acceptance of computers there 

15 exists a well recognized need for a voice-based command system 
that responds appropriately to only a general description of 
tasks to be performed by the computer. Some systems have been 
demonstrated which permit speaking conventional computer 
commands. For example, a MS DOS command for copying all Word for 

20 Windows files in one directory into another directory named 
"john" might be spoken as follows. 

Copy *.doc john 

However, to be truly effective a voice-based command system needs 
not only to translate a spoken command into a sequence of words, 
25 but also to interpret natural-language sentences such as that set 
forth below as a coherent command recognizable to and executable 
by the computer. 

Copy all word files to John's directory. 



WO 98/09228 PCT/US97/15388 

- 2 - 

Because natural-language allows a computer user to prescribe 
a set of smaller tasks with a single sentence, an ability to 
handle high-level, abstract commands is key to an effective 
voice-based command system. The ability to handle high-level, 
5 abstract commands makes a voice-based command interface easy to 
use and potentially faster than keyboard or pointing device based 
computer control. Moreover, under certain circumstances a voice- 
based command system is essential for controlling a computers 
operation such as for physically handicapped individuals, and for 
10 normal individuals while performing tasks which occupy both of 
their hands. 

Voice control of computers allows speech and a high-level 
of abstraction and complexity in the command. For instance, in 
giving directions we might simply say "turn left at the light". 
15 Presently, this type of command is possible only when communicat- 
ing with other humans. Communicating with computers or 
equipment requires a series of commands at a much lower level of 
abstraction. For instance, the previous instruction would at a 
minimum need to be expanded as follows. 
20 Go Straight 

Find Light 
Turn Left 
Go Straight 

Similarly for a jet aircraft landing on a deck of an aircraft 
25 carrier, the command "abort landing" would at a minimum translate 
to the following set of commands. 

Afterburner On 

Steady Course 

Retract Flaps 



WO 98/09228 PCT/OS97/15388 

- 3 - 

Retract Speed Brakes 
Retract Landing Gear 
Issuing the preceding sequence of commands by voice requires 
too much time, and would therefore probably result in a crash. 
5 To be effective, the pilot needs to be able to control the 
aircraft with one high-level command, in this case "abort 
landing", and the computer must execute all the commands needed 
to accomplish this task. 

In actual practice, each of the natural-language commands 
10 set forth above needs a set of sub-instructions. Thus, despite 
the present ability of computer technology to transcribe speech 
into words, real-time voice control of equipment has, thus far, 
remain an elusive goal. Conversely, an ability to issue spoken 
natural-language commands permits communicating with equipment 
15 ranging from computers to aircraft at a higher level of abstrac- 
tion than is presently possible. A natural -language voice 
interface will allow applications such as voice control of 
vehicles, and voice control of computer applications. 

There are three basic approaches to natural-language 
20 syntactic processing: simple grammar, statistical, and Govern- 
ment-and-Binding-based (GB-based) . Of these three approaches, 
simple grammars are used for simple, un-complicated syntax. 
Examples of grammars for such a syntax include early work such 
as the psychiatrist program 1 Eliza 1 . However, writing a full 
25 grammar for any significant portion of a natural-language is very 
complicated. For specialized domains, the grammar based approach 
is abandoned for a statistical one as described by Carl G. de. 
Marcken, Parsing the Lob Corpus, Proceedings of the 28 Annual 
Meeting of the Association for Computational Linguistics, June, 



WO 98/09228 PCT/US97/15388 

- 4 - 

1990. Statistical approaches look at word patterns and word 
co-occurrence and attempt to parse natural-language sentences 
based on the likelihood of such patterns. Statistical approaches 
use a variety of methods including neural networks and word 
5 distribution. As with any other statistical pattern matching 
approach, this approach is ultimately limited by an upper limit 
on error rate which cannot be easily exceeded. Also, it is very 
difficult to handle wide varieties of linguistic phenomena such 
as scrambling, NP-movement, binding between question words and 
10 empty categories, etc., through statistical natural-language 
processing. 

Approaches to natural- language processing based on Noam 
Chomsky's Government and Binding theories as described in Some 
Concepts and Consequences of the Theory of Government and Binding 

15 Cambridge, Mass. MIT Press, offer a possibility of a more robust 
approach to natural-language parsing by developing computational 
methods based on linguistic theory of a universal language. 
Head-driven Phrase Structure Grammar (HPSG) is a major off-shoot 
of GB theory and a number of such parsers are being developed. 

20 The GB-based approach can find syntactic structure in scrambled 
sentences such as 'I play football' and 'football, I play'. The 
GB-based approach also handles NP-movement that is exemplified 
by a passive sentence such as, 'Football was played.' which has 
the deeper structure ' [ [] [was played [football]] ]'. m 

25 parsing this natural-language sentence the noun phrase (NP) 
•football' moves from its original position after the verb 'was 
played' to the front of the sentence, because otherwise the 
sentence would have no subject. Binding between question words 
and empty categories is exemplified by a question such as 'Whom 



WO 98/09228 PCT/US97/15388 

- 5 - 

will he invite.' The GB approach finds that this sentence has 
the deep structure [he will [invite [whom]] J. The question word 
•whom' binds the empty trace that it leaves when it moves to the 
front of the sentence. 

The principle-based-parsing technique described by Robert 
C. Berwick, in Principles of Principle-Based Parsing, 
Principle-Based Parsing: Computational and Psycholinguistics, 
Kluwer Academic Publishers, pp. 1-37 (1991), and by Fong and 
Sandiway in The Computational Implementation of Principle-Based 
Parsers, Principle-Based Parsing: Computational and 
Psycholinguistics, Kluwer Academic Publishers, pp. 65-83 (1991), 
offers a possibility of a more robust approach. 
Principle-based-parsing uses a few principles for filtering 
sentences. A sequence of principle based filters eliminates 
illegal parses and the remaining parse is the legal one. A 
primary difficulty with this method is that it generates too many 
parses which makes the GB-based approach computationally slow. 
Methods for improving performance of GB-based parsing include: 

1. appropriately sequencing the principle based filters 
to reduce over-generation as described by Fong and 
Sandiway; or 

2. •co-routining 1 by interleaving the actual parsing 
mechanism with the principle filters as described by 
Bonnie Jean Dorr in Principle-Based Parsing for 
Machine Translation, Principle-Based Parsing: Computa- 
tional and Psycholinguistics, Kluwer Academic Publish- 
ers, pp. 153-183 (1991). 



WO 98/09228 PCT/US97/15388 

- 6 - 

Disclosure of Invention 

An object of the present invention is to provide a voice- 
based command system that can translate commands spoken in 
natural- language into commands accepted by a computer program. 
5 An object of the present invention is to provide a voice- 

based command system that can translate commands spoken in 
natural-language into commands accepted by different computer 
programs . 

Another object of the present invention is to provide a 

10 natural-language-syntactic-parser that resolves ambiguities in 
a voice command. 

Another object of the present invention is to provide a 
command interpreter that handles incomplete commands gracefully 
by interpreting the command as far as possible, and by retaining 

15 information from the command for subseguent clarification. 

Another object of the present invention is to provide a 
voice based command system that is efficient in any operating 
environment, and that is portable with minor modifications to 
other operating environments. 

20 Briefly, the present invention is a natural-language speech 

control method that produces a command for controlling the 
operation of a digital computer from words spoken in a natural- 
language. The method includes the step of processing an audio 
signal that represents the spoken words to generate textual 

25 digital-computer-data. The textual digital-computer-data 
contains representations of the words in the command spoken in 
a natural-language. The textual digital-computer-data is then 
processed by a natural-language-syntactic-parser to produce a 
parsed sentence. The parsed sentence consists of a string of 



WO 98/09228 PCT/US97/15388 

- 7 - 

words with each word being associated with a part of speech in 
the parsed sentence. The string of words is then preferably 
processed by a semantic compiler to generate the command that 
controls the operation of the digital computer. 
5 The preferred embodiment of the present invention uses a 

GB-based natural-language-syntactic-parser which reveals implied 
syntactic structure in English language sentences. Hence the 
GB-based natural-language-syntactic-parser can resolve ambiguous 
syntactic structures better than alternative methods of natural- 

10 language processing. Using a generalized principles-and- 
parameters GB-based natural-language-syntactic-parser for the 
natural-language speech control method provides a customizable 
and portable parser that can be tailored to different operating 
environments with modification. With generalized principles-and- 

15 parameters, a GB-based approach can describe a large syntax and 
vocabulary relatively easily, and hence provides greater robust- 
ness than other approaches to natural-language processing. 

These and other features, objects and advantages will be 
understood or apparent to those of ordinary skill in the art from 

20 the following detailed description of the preferred embodiment 
as illustrated in the various drawing figures. 



Brief Description of Drawings 

FIG. 1 is a flow diagram illustrating the overall approach 
25 to processing spoken, natural-language computer commands with a 
natural-language speech control system in accordance with the 
present invention; 



WO 98/092*8 PCTYUS97/15388 

- 8 - 

FIG. 2 is a flow diagram, similar to that depicted in FIG. 
1, that illustrates the presently preferred embodiment of the 
natural-language speech control system; 

FIG. 3 depicts a logical form of a parsing of a sentence 
produced the presently preferred GB-based principles-and- 
parameters syntactic parser employed in the natural- language 
speech control system depicted in FIG. 2; 

FIG. 4 is a flow diagram illustrating how a sentence is 
parsed by the presently preferred GB-based principles-and-parame- 
ters syntactic parser employed in the natural-language speech 
control system depicted in FIG. 2; 

FIG. 5 is a block diagram depicting an alternative embodi- 
ment of a semantic compiler that converts parsed computer 
commands into machine code executable as a command to a digital 
computer program; and 

FIG. 6 f is a block diagram depicting a preferred embodiment 
of a semantic compiler that converts parsed computer commands 
into machine code executable as a command to a digital computer 
program. 

Best Mode for Carrying Out the Invention 

FIG. l depicts a natural-language speech control system in 
accordance with the present invention referred to by the general 
reference character 20. As illustrated in FIG. l, the 
natural- language speech control system 20 first processes a 
spoken command received as an audio signal with a robust 
automatic speech recognition computer program 22. The speech 
recognition computer program 22 produces textual digital- 
computer-data in the form of an ASCII text stream 24 that 



WO 98/09228 PCT/US97/1 5388 

- 9 - 

contains a text of the spoken words as recognized by the speech 
recognition computer program 22. The text stream 24 is then 
processed by a syntactic-parser 26 which converts the text stream 
24 , representing the spoken words , into a parsed sentence having 
5 a logical form 28. The logical form 28 associates a part of 
speech in the parsed sentence with each word in a string of 
words. The logical form 28 is processed by a semantic compiler 
32 to generate a command in the form of a machine code 34 that 
is then processed by a computer program executed by a computer 

10 36 to control its operation. 

As is readily apparent to those skilled in the art, the 
speech recognition computer program 22 , syntactic-parser 26 and 
semantic compiler 32 will generally be computer programs that are 
executed by the computer 36. Similarly, the text stream 24 and 

15 logical form 28 data in general will be stored, either temporari- 
ly or permanently, within the computer 36. 

FIG. 2 is a flow diagram that depicts a presently preferred 
implementation of the natural-language speech control system 20. 
As depicted in FIG. 2, the preferred implementation of the 

20 natural-language speech control system 2 0 includes an error 
message facility 42. The error message facility 42 permits the 
natural-language speech control system 20 to inform the speaker 
of difficulties that the natural- language speech control system 
20 encounters in attempting to process a spoken computer command. 

25 The error message facility 42 inform the speaker about the 
processing difficulty either audibly or visibly. In the specific 
implementation of the natural- language speech control system 20 
depicted in FIG. 2, the machine code 34 produced by the semantic 
compiler 32 is an MS DOS command. The computer 36 executes the 



WO 98/09228 PCIYUS97/15388 

-lo- 
ws DOS command to produce a result 44 specified by the spoken 
command . 



Speech Reco gnition Computer Program 22 

5 The speech recognition computer program 22 processes the 

audio signal that represents spoken words to generate a string 
of words forming the text stream 24 . A number of companies have 
developed computer programs for transcribing voice into text. 
Several companies offering such computer programs are listed 
10 below. 

1. BBN, a wholly owned subsidiary of GTE, has a 
Unix-based speech recognizer called Hark 

2. Dragon Systems markets Dragon Dictate 

3. IBM markets VoiceType Dictation 
15 4. Kurzweil Applied Intelligence 

5. Microsoft Research 1 s Speech Technology Group is 
developing a speech recognition engine named Whisper 

6 . Pur eSpeech 

7. SRI Corpus STAR Lab has a group developing a 

20 wideband, continuous speech recognizer called DECI- 

PHER 

8. The AT&T's Advanced Speech Products Group offers a 
speech recognizer named WATSON. 



Most of the systems identified above work with discrete 
speech in which a speaker must pause between words. Also these 
systems require some level of speaker training to attain high- 
accuracy speech recognition. Ideally, a continuous speech 
recognizer that employs a Hidden Markov Model is to be preferred. 



WO 98/09228 PCT/US97/15388 

- 11 - 

Of the systems listed above, Dragon Systems speech recognizer 
seems to be the most robust, has been used by the United States 
Armed forces in Bosnia, and is presently preferred for the 
natural-language speech control system 20. The Dragon Systems 
5 speech recognizer runs on an IBM PC compatible computer operating 
under the Microsoft Windows graphical user interface. Initial 
tests have demonstrated a very high degree of accuracy with a 
large number of speakers with unconstrained language and a 
variety of accents. 

10 In general, for a single sentence or command the speech 

recognition computer program 22 can generate a plurality of 
word-vectors. Each word- vector corresponds to one spoken word 
in the sentence or computer command. Each word-vector includes 
at least one, but probably several, two-tuples consisting of a 

15 word recognized by the speech recognition computer program 22 
together with a number which represents a probability estimated 
by the speech recognition computer program 22 that the audio 
signal actually contains the corresponding spoken word. 
Exhaustive processing of a spoken command by the syntactic-parser 

20 26 requires that several strings of words be included in the text 
stream 24. Each string of words included in the text stream 24 
for such exhaustive processing is assembled by concatenating 
successive words selected from successive word-vectors. The 
several strings of words in the text stream 24 to be processed 

25 by the syntactic-parser 2 6 are not identical because in every 
string at least one word differs from that in all other strings 
of words included in the text stream 24. 



WO 98/09228 PCT/US97/15388 

- 12 - 

Syntactic-Parser Computer Program 26 

The syntactic-parser 26 incorporated into the preferred 
embodiment of the natural-language speech control system 20 is 
based on a principles-and-parameters (P-and-P) syntactic parser, 
Principal Principar has been developed by and is available from 
Prof. DeKang Lin at the University of Manitoba in Canada. 
P-and-P parsing is based on Noam Chomsky f s GB-based theory of 
natural-language syntax. Principar' s significant advantage over 
other natural-language-syntactic-parsers is that with relatively 
few rules, it can perform deep parses of complex sentences. 

The power of the P-and-P framework can be illustrated by 
considering how it can easily parse both Japanese and English 
language sentences. In English, typically the word order in a 
sentence is subject-verb-object as in 'He loves reading'. But 
in Japanese, the order is typically subject-object- verb. Now if 
GB-based parser employs a principle which states that 'sentences 
contain subjects, objects and verbs', and the GB-based parser »s 
parameter for 'word-order' of sentences is subject-verb-object 
for English and subject-object -verb for Japanese, the GB-based 
parser's principles and parameters described a grammar for simple 
sentences in both English and Japanese. This is the essence of 
the P-and-P framework. 

To describe the complex interactions of different sentence 
elements, the syntactic-parser 26 depicted in FIG. 2 uses the 
following principles. 

1. Case Theory: Case theory requires that every overt 
noun phrase (NP) be assigned an abstract case, such as 
nominative case for subjects, accusative case for 
direct objects, dative case for indirect objects, etc. 



WO 98/09228 PCT/US97/15388 

- 13 - 

2. X-bar Theory: X-bar theory describes how the syntactic 
structure of a sentence is formed by successively 
smaller units called phrases. This theory determines 
the word-order in sentences. 
5 3. Movement Theory: The rule Move-a specifies that any 

sentence element can be moved from its base position 
in the underlying D-structure, to anywhere else in the 
surface structure. Whether a particular movement is 
allowed depends on other constraints of the grammar, 
10 For example, the result of a movement must satisfy the 

X-bar schema. 

4. Bounding Theory: This theory prevents the results of 
movement from extending too far in the sentence. 

5. Binding Theory: This theory describes the structural 
15 relationship between an empty element left behind by 

a moved NAP and the moved NP itself. 

6. 6-Theory: This theory deals with the assignment of 
semantic roles to the NPs in a sentence. 

The preceding principles, and some other more complex ones that 
20 are described by Robert C. Berwick, in Principles of 
Principle-Based Parsing, Principle-Based Parsing: Computational 
and Psycholinguistics, Kluwer Academic Publishers, pp. 1-37 
(1991) , are used for parsing English with Principar. 

With a GB-based approach to natural- language parsing, 
25 commands to computers can be understood as verb phrases that are 
a sub-set of complete English sentences. The sentences have an 
implied second person singular pronoun subject and the verb is 
active voice present tense. For instance, to resume work on a 



WO 98/09228 



PCT/US97/15388 



- 14 - 

previous project, one might issue to a computer the following 
natural-language command. 

'Edit the first document on nip-based command interpreters. 0 

Possible word vectors that the speech recognition computer 
program 22 might produce for the preceding sentence are set forth 
below. 



10 



15 



20 



25 



30 



35 



40 



45 



edit 


edit 


0.90 




a-dot 


0.50 


the 


the 


0.70 




da 


0.60 




their 


0.40 




there 


0.40 




them 


0.20 


first 


first 


0.80 




force 


0.40 




fast 


0.30 




force 


0.30 




hearse 


0.15 




curse 


0.15 




purse 


0.05 


document 


document 


0.80 




dock-meant 


0.40 


on 


on 


0.75 




nun 


0.40 




an 


0.35 


nip 


nip 


0.10 


based 


based 


0.75 




baste 


0.50 




paste 


0.35 


command 


command 


0.90 




come-and 


0.55 


interpreters 


interpreters 


0.85 




inter-porter 


0.40 



A parsing of the preceding sentence by Principar for the 
actual words appears in FIG. 3. The GB-based parse presented 
in FIG. 3 allows the computer to map a verb (V) into a computer 
50 command action, with the noun phrase (NP) as the object, and the 
adjective phrase (AP) as properties of the object. 



WO 98/09228 



PCT/US97/15388 



Limiting GB-based syntactic parsing to only active voice , 
second person verb-phrase parsing, permits implementing an 
efficient semantic compiler 32 that allows operating a computer 
with computer transcribed voice commands. Since only a sub-set 
5 of English is used computer commands, parameters can be set to 
limit the number of parses generated by the syntactic-parser 26. 
For example, the case principle may be set to only accusative 
case for verb-complements, oblique case for prepositional 
complements and genitive case for possessive nouns or pronouns. 

10 A nominative case principle is unnecessary since the computer 
commands lack an express subject for the main clause. Such 
tuning of the principles to be applied by the syntactic-parser 
2 6 significantly reduces the number of unnecessary parses 
produced by the GB-based P-and-P syntactic-parser Principar. 

15 By using a GB-based P-and-P syntactic-parser, moving the 

natural-language speech control system 20 between computer 
applications or between computer platforms involves simply 
changing the lexicon, and the parameters. Due to the modular 
framework of the grammar implemented by the syntactic-parser 26, 

20 with minor changes in parameter settings more complicated 
sentences such as the following queries and implicit commands may 
be parsed. 



'Which files have been modified after July 4th?' 
25 'How many words are there in this document?' 

'I would like to delete all files in this directory.' 



As illustrated in FIG. 4, the syntactic-parser 26 includes 
a set of individual principle-based parsers 52 P x through P ft , a 

30 dynamic principle-ordering system 54, principle parameters 
specifiers 56, and a lexicon specifying system 58. The heart of 
the syntactic-parser 26 is the set of individual principle-based 
parsers 52. Each of the principle-based parsers 52 implements 
an individual principle such as those listed and described above. 

35 Each principle is abstract and is described in a manner different 
from each other (i.e. heterogeneous). For instance, the X-bar 
theory for English states that the verb must precede the object, 
while the 8-theory states that every verb must discharge itself. 



WO 98/09228 



- 16 - 



PCT/US97/15388 



The various principle-based parsers 52, each implemented as a 
separate computer program module, formalize the preceding princi- 
ples. Each principle-based parser 52 applies its principle to 
the input text and the legal parses which it receives from the 
5 preceding principle-based parser 52. The principle-based parser 
52 then generates a set of legal parses according to the 
principle which it formalizes. Because the principle-based 
parsers 52 process an input sentence sequential ly, the 
syntactic-parser 2 6 employs a set of data structures common to 

10 all the principle-based parsers 52 that allows the input text and 
the legal parses to be passed from one principle-based parser 52 
to the next. Moreover, the syntactic-parser 26 includes a 
principle-ordering system 54 that controls a sequence in which 
individual principles, such as those summarized above, are 

15 applied in parsing a text. 

To parse more than one language, each of the principle-based 
parsers 52 receives parameter values from the principle parame- 
ters specifiers 56. For instance, with the X-bar theory, a verb 
precedes the object in English, while in Japanese the object 

20 precedes the verb. Consequently, the grammar for each principle 
formalized in the principle-based parsers 52 needs to be dynami- 
cally generated, based on parameter values provided by the 
principle parameters specifiers 56. 

Principar's lexicon specifying system 58 contains over 

25 90,000 entries, extracted out of standard dictionaries. The 
structure of the lexicon specifying system 58 is a word-entry 
followed by functions representing parts-of -speech categories and 
other features. To properly parse computer commands, Principals 
lexicon must be extended by adding recently adopted, platform- 

3 0 specific computer acronyms. 

Semantic compiler Computer Program 32 

Parsing the text stream 24 into the logical form depicted 
in FIG. 3 permits the semantic compiler 32 to use a conventional 
35 LR grammar in generating the machine code 34 from the logical 
form 28. Parsing the text stream 24 into the canonical form is 
possible because commands are restricted to imperative sentences 
that are second person, active voice sentences that begin with 



WO 98/09228 



PCTAJS97/15388 



- 17 - 

a verb. Limiting the natural-language commands in this way 
insignificantly restricts the ability to issue voice commands. 
The canonical logical form of a command can be parsed into the 
machine code 34 by the semantic compiler 32 using a conventional 
5 lexical analyzer named LEX and a conventional compiler writer 
named YACC. . 

As indicated in FIG. 2, the preferred semantic compiler 3 2 
has an ability to detect some semantic errors, and then send a 
message back to a speaker via the error message facility 4 2 about 
10 the specific nature of the error. An example of a semantic error 
would be if an action was requested that was not possible with 
the object. For instance an attempt to copy a directory to a 
file would result in a object type mix-match, and therefore cause 
an error. 

15 An alternative approach for generating the machine code 34 

to the conventional LR grammar described above would be for the 
semantic compiler 32 to take parse trees expressed in the 
canonical form in the logical form 28 as input and then map them 
into appropriate computer commands. This would be done by a 

20 command-interpreter computer program 62 by reference to mapping 
tables 64 which maps verbs to different actions. 

Different computer programs perform the same abstract 
natural-language commands for similar operations. However, each 
computer program requires different types of commands that need 

25 to be handled uniquely. The conventional LR grammar, or the 
combined command- interpreter computer program 62 and mapping 
tables 64, permit the semantic compiler 32 to prepare operating 
system commands 72, word processing commands 74, spreadsheet 
commands 76 , and/or database commands 78 from the parse trees in 

30 the logical form 28. Note that the command- interpreter computer 
program 62 needs to have different functionality depending on the 
application domain to which the command is addressed. If a 
computer command is directed to DOS or a Unix command shell, the 
operating system can directly execute the machine code 34. But 

35 the word processing commands 74, the spreadsheet commands 76, or 
the database commands 78 must be piped through the operating 
system to that specific application. To facilitate this kind of 
command, the natural-language speech control system 20 must run 



WO 98/09228 



PCT/US97/15388 



- 18 - 

in the background piping the machine code 34 to the current 
application. 

Industrial Applicability 
5 in adapting the natural-language speech control system 20 

for preparing commands for execution by a variety of computer 
programs, the speech recognition computer program 22 and the 
syntactic-parser 26 are the same regardless of the computer 
program that will execute the command. However, as depicted in 

10 FIG. 6, the semantic compiler 32 includes a set of semantic 
modules 84 used for generating commands that control different 
computer programs. Of these semantic modules 84, there are a set 
of semantic modules 84 to prepare commands for controlling 
operating system functions. Other optional semantic modules 84 

15 generate commands for controlling operation of different applica- 
tion computer programs such as the word processing commands 74 , 
spreadsheet commands 7 6 and database commands 78 illustrated in 
FIG. 5. In addition, the semantic compiler 32 includes a set of 
semantic modules 84 for configuration, and for loading each 

20 specific application computer program. 

Although the present invention has been described in terms 
of the presently preferred embodiment, it is to be understood 
that such disclosure is purely illustrative and is not to be 
interpreted as limiting. While preferably the text stream 24 

25 represents a spoken command with an ASCII text stream, as is 
readily apparent to those skilled in the art any digital computer 
representation of textual digital-computer-data may be used for 
expressing such data in the text stream 24. Similarly, while 
preferably the semantic compiler 32 employs a canonical logical 

30 form to represent computer commands parsed by the semantic 
compiler 32, any other representation of the parsed computer 
commands that provides the same informational content may be used 
in the semantic compiler 32 for expressing parsed commands. 
Consequently, without departing from the spirit and scope of the 

35 invention, various alterations, modifications, and/or alternative 
applications of the invention will, no doubt, be suggested to 
those skilled in the art after having read the preceding 
disclosure. Accordingly, it is intended that the following claims 



WO 98/09228 PCT/US97/15388 

- 19 - 

be interpreted as encompassing all alterations, modifications, 
or alternative applications as fall within the true spirit and 
scope of the invention. 



WO 98/09228 



PCT/US97/15388 



- 20 - 



The Claims 

What is ^claimed is: 



1. A universal voice-command-interpretation method for 
producing from spoken words a command that is adapted for 
controlling operation of a digital computer, the method compris- 
ing the steps of: 

receiving an audio signal that represents the spoken words; 

processing the received audio signal to generate therefrom 
textual digital-computer-data that contains representations of 
individual spoken words; 

processing the textual digital-computer-data with a 
natural-language-syntactic-parser to produce a parsed sentence 
that consists of a string of words with each word being associat- 
ed with a part of speech in the parsed sentence; and 

generating the command from the parsed sentence. 

2. The method of claim 1 wherein the parsed sentence has 
a syntax of an implied second person singular pronoun subject and 
an active voice present tense verb. 



3. The method of claim l wherein processing of the audio 
signal to generate therefrom the textual digital-computer-data 
produces a plurality of word-vectors. 

4 . The method of claim 3 wherein each word-vector includes 
at least one two-tuple consisting of a word together with a 
number which represents a probability that the audio signal 
actually contains that spoken word. 

5. The method of claim 3 wherein the textual 
digital - computer-data processed by the 
natural-language-syntactic-parser consists of a string of words, 
each successive word being selected from successive word-vectors 
in the plurality of word-vectors. 

6. The method of claim 5 wherein in producing the command 
the natural-language-syntactic-parser processes at least two 



WO 98/09228 



PCT/US97/15388 



- 21 - 

unidentical strings of words in which at least one word is 
different. 

7. The method of claim 1 wherein the 
5 natural-language-syntactic-parser is a government-and-binding- 

based (GB-based) natural-language-syntactic-parser. 

8. The method of claim 7 wherein the GB-based 
natural-language-syntactic-parser is a principles-and-parameters 

10 (P-and-P) syntactic parser. 

9. The method of claim 1 wherein the command is generated 
from the parsed sentence by a semantic compiler. 

15 10. The method of claim 9 wherein the semantic compiler 

uses a LR grammar in generating the command. 

11. The method of claim 10 wherein the semantic compiler 
upon detecting a semantic error dispatches a message that 

20 describes the semantic error. 

12. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented audibly to a speaker. 

25 

13. The method of claim 11 wherein the message describing 
the semantic error that is dispatched by the semantic compiler 
is presented visibly to a speaker. 

30 14. The method of claim 10 wherein the semantic compiler 

includes a plurality of semantic modules that respectively 
generate commands for controlling operation of different computer 
programs . 

3 5 15. The method of claim 14 wherein the semantic compiler 

includes a at least one semantic modules that generates operating 
system commands. 



WO 98/09228 PCT/US97/15388 

- 22 - 

16. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates applica- 
tion program commands. 

17. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates configu- 
ration commands. 

18. The method of claim 14 wherein the semantic compiler 
includes a at least one semantic modules that generates program 
loading commands. 

19. The method of claim 1 further comprising the step of 
transmitting the command to the digital computer. 



WO 98/09228 



PCT/US97/15388 




(VP 
(Vbar 
(V 

(V_NP 
7v_NP edit) 
(NP 
(Nbar 

(dec the) FIG 3 

(N documenc) • 
(PP 
(Pbar 
(P 

(P on) 
(NP 
(Nbar 

(AP 

(N NLP) 
(Abar 

(A based) ) ) 

(AP 

(Abar 

(A command) ) ) 

(N English)))))))))))) 



WO 98/09228 



PCT/US97/15388 



2/2 



Principle Ordering 
System 



Input Sentence — 



P1 



52. I 



-5y 



P2 



P3 



SZ 



1 



P4 —parsed Sentence 



6(o- 



V^ue$ 



L&rioon 



SB 



Iparse trees | 




r 



command- 
interpreter 



32 -h 



I ^ T ^ 



mapping 
tables 



I 




os command 



word- 
processor 



spreadsheet 
command 



12 
79 

~\ 

76 



database 
command 



18 



FIG. 5 




INTERNATIONAL SEARCH REPORT 



I ru~. national appticaiton No. 
PCT/US97/15388 



A. CLASSIFICATION OF SUBJECT MATTER 

!PC(6) :C06F 17/20, 17/27; G10L 5/00 

USCL :704/l,9, 275 
According Lo International Patent Class ificaliun (IPC) or to bulh national classification and IPC 


a FIELDS SEARCHED 


Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 704/9, 275, 1, 8, 235 , 251. 257; 395/2.44, 2.66, 2.84. 759 


Documentation searched other than minimum documentation to the extent that such documents arc included in the fields searched 


Electronic data base consulted during the international search (name of data base and, where practicabl 
Please See Extra Sheet. 


e, search terms used) 


C. DOCUMENTS CONSIDERED TO BE RELEVANT 


Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US 5,060,155 (VAN ZUULEN) 22 October 1991, abstract; col. 1, 
line 1 to col. 3, line 28; col. 14, line 1 to col. 17, line 36 


1, 7-8, 19 


A 


US 5,146,406 A (JENSEN) 08 September 1992, abstract; fig. 2; col. 
1, line 14 to col. 3, line 26; col. 5, line 34 to col. 6, line 31; col. 7, 
line 63 to col. 10, line 33 


1-6, 19 


Y 
A 


US 5,418,717 A (SU et al) 23 May 1995 abstract; figs. 1-2, 3A-3C, 
& 10A-10B; col. 1, line 20 to col. 2, line 66; col. 5, line 48 to col. 
10, line 33; col. 13, line 5 to col. 20, line 46 


1-6, 19 
7-18 


fx] Further documents are listed in the continuation of Box C. See patent family annex. 


* Special categorise of cited document*: 

to be of particular relevance 

"B" earlier document published on or after the interna****! films date 

"L* document which may throw doubtt on priority cbtro(a) or which h 
chad to establish tho publication data of another citation or other 
special reason (a* ►pacified) 

•O* document raf erring to an oral disclosure, use. exhibition or other 

"P* document published prior to the international filing data but later than 
the priority date claimed 


*T* later document published after the international films date or priority 
date end not in conflict with the application but eked to understand 
the principle or theory underlying the invention 

*X* document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to invoke en inventive step 
when Che document is taken alone 

* V document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document b 
combined with one or more other such documents, such combination 
being obvious to t person skilled in the an 

•A* document member of the same patent family 


Date of the actual completion of the international search 
14 OCTOBER 1997 


Date or mailing of the international search report 

04 DEC 1997 


Name and mailing address of the ISA/US 
Commissioner of Patents and Trademarks 
Box PCT 

Washington, D.C. 20231 
Facsimile No. (703) 305-3230 


JOSEPH THOMAS ]^>^ 
Telephone No. (703) 308-3900 



Form PCT/ISA/210 (second sheet)(July 1992)* 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US97/ 15388 



C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


A 


US 5,457,768 A (TSUBOI et al.) 10 October 1995 abstract; col. 1, 
line 13 to col. 2, line 40; col. 4, line 13 to col. 9, line 5; col. 14, 
line 4 to col. 15, line 51 

US 5,555,169 A (NAMBA et al) 10 September 1996, abstract; figs. 
1-4; col. 1, line 31 to col. 3, line 8; col. 4, line 22 to col. 9, line 60 


1-6, 9-19 
1-6, 19 



Form PCT/ISA/210 (continuation of second sheet)(July 1992)* 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/US97/ 15388 



B. FIELDS SEARCHED 

Electronic data bases consulted (Name of data base and where practicable terms used): 
APS 

search terms: syntax/sytaclic parsing, computer/speech controlled, semantic compiler, word vector, government and 
binding, principle and parameters 



Form PCI7ISA/210 (extra sheet)(July 1992)*